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Abstract 

In this article we consider the approximation of expectations w.r.t. probability distributions asso¬ 
ciated to the solution of partial differential equations (PDEs); this scenario appears routinely in 
Bayesian inverse problems. In practice, one often has to solve the associated PDE numerically, 
using, for instance hnite element methods and leading to a discretisation bias, with the step-size 
level hp- In addition, the expectation cannot be computed analytically and one often resorts to 
Monte Carlo methods. In the context of this problem, it is known that the introduction of the mul¬ 
tilevel Monte Carlo (MLMC) method can reduce the amount of computational effort to estimate 
expectations, for a given level of error. This is achieved via a telescoping identity associated to 
a Monte Carlo approximation of a sequence of probability distributions with discretisation levels 
oo > > hi ■■■ > hp- In many practical problems of interest, one cannot achieve an i.i.d. sam¬ 

pling of the associated sequence of probability distributions. A sequential Monte Carlo (SMC) 
version of the MLMC method is introduced to deal with this problem. It is shown that under 
appropriate assumptions, the attractive property of a reduction of the amount of computational 
effort to estimate expectations, for a given level of error, can be maintained within the SMC 
context. The approach is numerically illustrated on a Bayesian inverse problem. 
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1. Introduction 

Consider a sequence of probability measures {rii}i>o on a common measurable space (E, T); we 
assume that the probabilities have common dominating hnite-measure du and write the densities 
w.r.t. du as r]i = rii{u). In particular, for some known 'ji : E ^ we let 

/ N Uliu) /.N 

Vi{u) = ( 1 ) 

where the normalizing constant Zi = J^'ji(u)du may be unknown. The context of interest is when 
the sequence of densities is associated to an ‘accuracy’ parameter hi, with h; —)■ 0 as / —)■ cx) with 
oo > ho > hi >■■■ > hoc = 0. This set-up is relevant to the context of discretised numerical 
approximations of continuum helds, as we will explain below. The objective is to compute: 

^Vaoiaiu)] := / g{u)rioc{u)du 
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for potentially many measurable ? 7 oo—integrable functions g : E ^ W. In practice one cannot treat 
hi = 0 and must consider these distributions with hi > 0. 

Problems involving numerical approximations of continuum helds are discretized before being 
solved numerically. Finer-resolution solutions are more expensive to compute than coarser ones. 
Such discretizations naturally give rise to hierarchies of resolutions via the use of nested meshes. 
Successive solution at rehned meshes can be utilized to mitigate the number of necessary solves 
for the finest resolutions. For the solution of linear systems, the coarsened systems are solved as 
pre-conditioners within the framework of iterative linear solvers in order to reduce the condition 
number, and hence the number of necessary iterations at the hner resolution. This is the principle 
of multi-grid methods. For Monte Carlo methods, as in the context above, a telescoping sum of 
associated differences at successive rehnement levels can be utilized. This is so that the bias of the 
resulting multilevel estimator is determined by the hnest level but the variance of the estimators 
of the differences decays. The reduction in the variance at hner levels implies that the number 
of samples required to reach a given error tolerance is also reduced with increasing resolution. 
This procedure is then optimized to balance the extra per-sample cost at the hner levels. Overall 
one can obtain a method with smaller computational ehort to reach a pre-determined error than 
applying a standard Monte Carlo method immediately at the hnest resolution [12]. 

Multi Level Monte Carlo (MLMC) [12] (see also [13]) methods are such that one typically sets 
an error threshold for a target expectation, and then sets out to attain an estimator with the 
prescribed error utilizing an optimal allocation of Monte Carlo resources. Within the context of 
[12, 14], the continuum problem is a stochastic diherential equation (SDE) or PDF with random 
coefficients, and the target quantity is an expectation of a functional, say g : E ^ M., of the 
parameter of interest U ^ E, over an ideal measure U ~ r]oo that avoids discretisation. The 
levels are a hierarchy of rehned approximations of the function-space, specihed in terms of a 
small resolution parameter say hi, for 0 < / < L, thus giving rise to a corresponding sequence of 
approximate laws rji. The method uses the telescopic sum 

L 

E,„|!,((7)] = E,MU)] + 5;{E,„|9((7)] - E„_.|9(C/)]} 

1=1 

and proceeds by coupling the consecutive probability distributions rji-i, rji- Thus, the expectations 
are estimated via the standard unbiased Monte Carlo averages 

Ni 

If' = Efsidb - 

i=\ 

where are i.i.d. samples, with marginal laws rji-i, rji, respectively, carefully constructed 

on a joint probability space. This is repeated independently for 0 < / < L. The overall multilevel 
estimator will be 

L 

= , ( 2 ) 

/=0 

under the convention that g{U^l) = 0. A simple error analysis gives that the mean squared error 
(MSE) is 

E{W, Multi - ^nJgmV = Multi - ^riAgmV + {^vAgm - E,JgiU)]A . (3) 

variance bias 
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One can now optimally allocate Nq, Ni,..., to minimize the variance term Vi/Ni for fixed 
computational cost ^iNi, where Vi is the variance of [g{Ui^'^) — and Ci the computa¬ 

tional cost for its realisation. Using Lagrange multipliers for the above constrained optimisation, 
we get the optimal allocation of resources Ni oc \/Vi/Ci. In more detail, the typical chronology is 
that one targets an MSE, say 0{e^), then (i) given a characterisation of the bias as an order of hi, 
one determines hi = M~\ I = 0,1,..., L, for some integer M > 1, and chooses a horizon L such 
that the bias is 0{e^) and (ii) given a characterisation of U, Ci as some orders of hi, one optimizes 
the required samples Nq, ... Nl needed to give variance 0{e^). Thus, a specihcation of the bias, 
variance and computational costs as functions of hi is needed. 

As a prototypical example of the above setting [12], consider the case U = X{T) with X{T) 
being the terminal position of the solution X of a SDE and r]i is the distribution of X{T) under 
the consideration of a numerical approximation with time-step At; = hi. The laws f]i-i, rji can be 
coupled via use of the same driving Brownian path. Invoking the relevant error analysis for SDE 
models, one can obtain (for U ~ ?7oo, Ui ^ Vi, and dehned on the common probability space): 

(i) weak error \E,[g{Ui) — g{U)]\ = 0{h'j"), providing the bias 0{h'j"), 

(ii) strong error, K\g{Ui) — giU)]"^ = 0{hf), giving the variance V) = 0{h^), 

(hi) computational cost for a realisation of g{Ui) — g{Ui-i), Ci = 0{hj^), 

for some constants a,/3,( related to the details of the discretisation method. The standard Euler 
Marayuma method for solution of SDE gives the orders a = /3 = ( = 1. 

Assuming a general context, given such rates for bias, V) and Ci, one proceeds as follows. Recall 
that hi = for some integer M > 1. Then, targeting an error tolerance of e and letting 

hi = = 0{e), one has L = log(e“^)/(Q;log(M)) -|- 0(1), as in [12]. Using the optimal 

allocation Ni oc ^/Vi/Ci, one hnds that Ni oc . Taking under consideration a target error of 

size 0{e), one sets TV; oc with Kl chosen to control the total error for increasing L. 

Thus, for the resulted estimator in (2)-(3), we have: 

Variance = ^ ^ ; 

1=0 1=0 

L 

Comp. Cost = ^ NiCi = Kle~^ . 

1=0 

To have a variance of one sets may or may not depend on e 

depending on whether this sum converges or not (recalling that L = 0(1 log(e)|)). In the case 
of Euler-Marayuma, for example, (3 = (, Kl = L, and the cost is 0(log(e)^e“^), versus 0(e~^) 
using a single level with mesh-size hi = 0(e). It /3 > (, corresponding for instance to the Milstein 
method, then the cost is 0(e“^). The latter is the cost of obtaining the given level of error for a 
scalar random variable, and is therefore optimal. The worst scenario is when /3 < (. In this case it 
is sufficient to set Kl = h^L to make the variance O(e^), and then the number of samples on 
the hnest level is given by iV^ = whereas the total algorithmic cost is 0{€~^3/o‘+^)'j^ where 

6 = 2 — 13/a > 0. In this case, one can choose the largest value for the bias, a = (3/2, so that 
Nl = 1 and the total cost, is dominated by this single sample. See [12] for more details. 

It is important to note that the realizations for a given increment must be coupled 

to obtain decaying variances U;. In the case of an SDE driven by Brownian motion one can simply 
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simulate the driving noise on level I and then upscale it to level / — 1 by summing elements of 
the hner path [12], For the case of a PDF forward model relying on uncertain input the scenario 
is quite similar [5]. For example, in the case that the input is of hxed dimension and the levels 
arise due to discretization of the forward map alone within a hnite element context, one would 
use the same realization of the inpnt on two separate meshes for a pairwise-conpled realization. 
Note that in the more general context of PDF, it is natural to decompose ( = d-'y, where d is the 
spatio-temporal dimension of the nnderlying continnnm. In particular, the number of degrees of 
freedom of a d—dimensional held approximated on a mesh of diameter hi is given by Then, 
the forward solve associated to the evalnation of g{Ui) may range from linear (7 = 1 ) to cnbic 
(7 = 3) in the nnmber of degrees of freedom. For example, the solution of an SDF or a sparse 
matrix-vector multiplication give 7 = 1 , a dense matrix-vector multiplication would give 7 = 2, 
and direct linear solve by Gaussian elimination would give 7 = 8 . 

The present work will focus on the case of an inverse problem with hxed-dimensional input. 
Indeed the difficulty arises here because we only know how to evaluate (np-to a constant) the target 
density at any given level, and cannot directly obtain independent samples from it. There exist 
many approaches to solving snch problem, for example one can review the recent works [14, 15] 
which use Markov chain Monte Carlo (MCMC) methods in the mnltilevel framework. In this 
article a more natnral and powerfnl formulation is considered, related with the use of Sequential 
Monte Carlo approaches. 

Seqnential Monte Carlo (SMC) methods are amongst the most widely used computational 
techniqnes in statistics, engineering, physics, finance and many other disciplines. In particular 
SMC samplers [ 8 ] are designed to approximate a sequence {? 7 z}z>o of probability distribntions on 
a common space, whose densities are only known np-to a normalising constant. The method uses 
> 1 samples (or particles) that are generated in parallel, and are propagated with importance 
sampling (often) via MCMC and resampling methods. Several convergence results, as N grows, 
have been proved (see e.g. [ 2 , 6 , 7, 10]). SMC samplers have also recently been proven to be 
stable in certain high-dimensional contexts [1]. Current state of the art for the analysis of SMC 
algorithms inclnde the work of [2, 3, 6 , 7, 10]. In this work, the method of SMC samplers is 
perfectly designed to approximate the seqnence of distribntions, bnt as we will see, implementing 
the standard telescoping identity of MLMC requires some ingenuity. In addition, in order to 
consider the benefit of using SMC, one must analyze the variance of the estimate; in such scenarios 
this is not a trivial extension of the convergence analysis previously mentioned. In particular, one 
must very precisely consider the anto-covariance of the SMC approximations and consider the rate 
of decrease of this qnantity as the time-lag between SMC approximations increases. Snch a precise 
analysis does not appear to exist in the literatnre. We note that our work, whilst presented in 
the context of PDFs, is not restricted to such scenarios and, indeed can be applied in almost any 
other similar context (that is, a seqnence of distribntions on a common space, with increasing 
compntational costs associated to the evalnation of the densities which in some sense converge to 
a given density); however, the potential beneht of doing so, may not be obvious in general. 

This article is strnctnred as follows. In Section 2 the ML identity and SMC algorithm are 
given. In Section 3 onr main complexity result is given nnder assnmptions and their implications 
are discussed. In Section 4 we give a context where the assnmptions of onr theoretical resnlts can 
be verihed. In Section 5 our approach is numerically demonstrated on a Bayesian inverse problem. 
Section 3 and the Appendix provide the proofs of onr main theorem. 
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2. Sequential Monte Carlo Methods 

2.1. Notations 

Let {E,S) be a measurable space. The notation Bh{E) denotes the class of bounded and 
measurable real-valued functions. The supremum norm is written as ||/||oo = sup„g^|/(u)| and 
V{E) is the set of probability measures on {E,S). We will consider non-negative operators K : 
E X S ^ M+ such that for each u ^ E the mapping A K{u, A) is a hnite non-negative measure 
on S and for each A G B the function u h-)■ K{u,A) is measurable; the kernel K is Markovian 
if K{u,dv) is a probability measure for every u E E. For a hnite measure /i on {E,S), and a 
real-valued, measurable / : i? —)■ M, we dehne the operations: 

jj,K ■. A^ j K{u,A) ^{du) ; Kf :«!—)■ j f{v) K{u, dv). 

We also write /i(/) = / f{u)fi{du). In addition || ■ r > 1, denotes the norm, where the 
expectation is w.r.t. the law of the appropriate simulated algorithm. 

2.2. Algorithm 

As described in Section 1, the context of interest is when a sequence of densities {77i}i>o, as in 
(1), are associated to an ‘accuracy’ parameter hi, with h; —)■ 0 as / —)■ cx), such that oo > ho > 
hi - ■ ■ > hoc = 0. In practice one cannot treat hoc = 0 and so must consider these distributions 
with hi > 0. The laws with large hi are easy to sample from with low computational cost, but 
are very different from r]oo, whereas, those distributions with small hi are hard to sample with 
relatively high computational cost, but are closer to r]oo. Thus, we choose a maximum level T > 1 
and we will estimate 

^riAaiU)] := / g{u)riL{u)du . 

J E 

By the standard telescoping identity used in MLMC, one has 

L 

E„.|9(C')1 = E„[g(C/)] + {E„|9(t;)] - E„_.[9(t;)]} 

1=1 

= E*[g(t/)l + |;E,„_.[(2d^_l)g(C/)] , (4) 

Suppose now that one applies an SMC sampler [8] to obtain a collection of samples (particles) 
that sequentially approximate rjo,rji,... ,r]L. We consider the case when one initializes the popula¬ 
tion of particles by sampling i.i.d. from r]o, then at every step resamples and applies a MCMC kernel 
to mutate the particles. We denote by ..., with -|-cx) > Nq > Ni > ■ ■ ■ > 1, 

the samples after mutation; one resamples according to the weights Gi{Ul) = (7;+i/7i)(17/), 
for indices / G {0,..., L — 1}. We will denote by the sequence of MCMC kernels used 

at stages 1,..., L — 1, such that rjiMi = rji. For 99 : i? —)■ M, / G {1,..., L}, we have the following 
estimator of 

Ni-i 

riA'iv) = J^YXUL) ■ 

1=1 

We dehne 

Ni-i 

nN(Gi-iM,(du,)) = jNY Gi-i(UL)M,(UUdu,) . 

^ 1 _ 1 
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The joint probability distribution for the SMC algorithm is 


No 


L-l Ni Ni.i 


nn 


2=1 


/ = 1 2 = 1 




If one considers one more step in the above procedure, that would deliver samples a 

standard SMC sampler estimate of the quantity of interest in (4) is r]^{g)] the earlier samples are 
discarded. Within a multilevel context, a consistent SMC estimate of (4) is 


f = vS'°(s )+ 


1=1 




(Gm) 


-»,-r‘(9)} 


( 5 ) 


and this will be proven to be superior than the standard one, under assumptions. 

There are two important structural differences within the MLSMC context, compared to the 
standard ML implementation of [12], sketched in Section 1; 

i) the L + 1 terms in (5) are not unbiased estimates of the differences Krn[g{U)] — J 5 f(f/)], 

so the relevant MSE error decomposition here is: 

E[{y-E,„|9(l7)|}"] <2E[{y-E,,l9(i7)|}"] +2{E.„l9((7)|-E,„|!,(i7)|p . (6) 


ii) the same L + 1 estimates are not independent. Hence a substantially more complex error 
analysis will be required to characterise E[{y — E,rjj^[g{U)]}‘^]. In Section 3, we will obtain an 
expression for this discrepancy, which will be more involved than the standard '^f=oVi/Ni, 
but will still allow for a relevant constrained optimisation to determine the optimal allocation 
of particle sizes Ni along the levels. 

Given an appropriate classification of both terms on the R.H.S. of (6) as an order of the tolerance 
for a Bayesian Inverse Problem (to be described in Section 4), one can specify a level L, and 
optimal Monte-Carlo sample sizes Ni so that the MSE of Y is 0{e‘^) at a reduced computational 
cost. 


3. Development of multilevel SMC 

3.1. Main Result 

We will now obtain an analytical result that controls the error term E[{y — E^^ [ 5 f(f/)]}^] in 
expression (6). This is of general signihcance for the development of MLSMC in various contexts. 
Then, we will look in detail at an inverse problem context (developed in Section 4) and fully 
investigate the MLSMC method. 

For any I G {0, ..., L} and ip G Bb{E) we write: rii{(p) := J^ip{u)rii{u)du. We introduce the 
following assumptions, which will be verihable in some contexts. They are rather strong, but could 
be relaxed at condsiderable increase in the complexity of the arguments, which will ultimately 
provide the same information. In addition, the assumptions are standard in the literature of SMC 
methods; see [6, 7]. 

(Al) There exist 0 < C < C < -|-cx) such that 

supsupG«(M) < C ; 

l>l u&E 

inf inf GAu) > C . 

l>lu&E ^ ^ ~ 


6 



(A2) There exists a p G (0,1) such that for any I > 1, {u,v) G E‘^, A E S: 



Mi(u, du) > p 



Theorem 3.1. Assume (Al-2). There exist C < +cx) and k G (0,1) such that for any g G Bb{E), 
with IIpIIoo = 1 , 


E[{y - E,Jg(C/)]P] < C ^ 

+ ^ - l||oo||%^Gq_l 

l<l<q<L 



3.2. Proof of Theorem 3.1 

The following notations are adopted; this will substantially simplify subsequent expressions: 

^i-i — Ni_^ X 'h-i \.y) ) 


Cr^(G.-i) 

Yi_i = _ 7]i_i{g) ( = pi{g) - Vi-iia) ) , 

= {^Gi_i{u) - 1 ) , 

pi{u) = g{u)tpi{u) , 

An{p, N) = p^{pGn)/g^{Gn) , ^ G Bb{E) , 0 < n < L - 1 , 

PniTGn) 


( 7 ) 


An{p,N) = An{p,N) 


Vn(.Crr, 


( 8 ) 

(9) 


Throughout this Section, C is a constant whose value may change, but does not depend on any time 
parameters of the Feynman-Kac formula, nor Ni. The proof of Theorem 3.1 follows from several 
technical lemmas which are now given and supported by further results in the Appendix; the proof 
of the theorem is at the end of this subsection. It is useful to observe that Zi/Zi^i = p;_i(Gz_i), 
Pi-iijPi) = 0 and |A„((p, A)| < |(p|oo with probability 1 as the conditional Li-norm of functional 
p over a discrete distribution. We will make repeated use of the following identity which follows 
from these observations upon adding and subtracting piff^{-^g{-)Gi-i{-))\ 

- Yi.i = Az_i(p, W_i) {pi_i - - Vi-i}ipi) ■ (10) 


Lemma 3.1. Assume (Al-2). There exists a G < +cx) such that for any I > 1: 

Z,-i , 




C* 


-Gi., - 


12 
I oo 


N, 


i-i 
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Proof. From (10) and the C' 2 -inequality we obtain: 

< 211.1,Af,_i){>£‘ - >).-i}(w)lll + 2 


< 2 IK,"'-' - „_,}(w)li:^ + 2 iKCi- - 

By [6, Theorem 7.4.4] we have that both L 2 -norms are upper bounded by - ’'~ 2 nP\ - 

completes the proof. 

By the C' 2 -inequahty and standard properties of i.i.d. random variables one has: 

N ^ N 


Ni- 


This 

□ 


E 




i=i 


1=2 


We have that: 

N 


E 


N N 

[{- >;-i) 

1=2 1=2 


+ 2 E[(y^«--y,_ 0 (y 7 r-r,-i 

2<l<q<L 


Lemma 3.1 gives that: 


N 


L II Zi-i 


E 


<CY," 


Ni-i 


1=2 1=2 
thus it remains to treat the cross-interaction terms. Using the decomposition in (10), we obtain 

Y, E[(U;-' - «-.)(Ur‘ - n-o] = 

2<l<q<L 

= ^ E [Ai_i{g, N)Aq_i{g, - r/i_i}(^){r/^7' - Vq-i}{^) ] 

2<l<q<L 

+ ^ E [Ai_i{g,N){g^_!f^ - -r/q_i}(^)] 

2<l<q<L 

+ ^ E [Ag_i{g,N){g^_!f^ - r/i_i}((^i){hSr' - Vq-ijiP^)] 

2<l<q<L 

+ ^ E [ - Vl-l}i^l){Vqli' - Vq-l}i^q) ] ■ 

2<l<q<L 

We will now apply Proposition Appendix A.l to the relevant terms in the sum, to yield the 
upper-bound: 

q-l 


C ll^^lloo||i^l|oo{^ + 

_ _ ' T Z 1 


l<l<q<L 




From here one can conclude the proof of Theorem 3.1. 



3.3. MLSMC Variance Analysis 

This section considers the specification of parameters for the MLSMC algorithm after considera¬ 
tion of Theorem 3.1. Recall that in the simpler SDE setting of [12] one must work with the strong er¬ 
ror estimate = 0{hf) and the deduced variance V = 'Va,T[g{Ui)—g{Ui^i)] = 0{h^). 

From Theorem 3.1, a similar role within MLSMC is taken by: 

Vr.= - IWI . (11) 


We assume that in the given context one can obtain that Vi = 0{hf) for some appropriate rate 
constant (3 > 1. Recall that we have hi = M~\ for some integer M > 1 and we assume a 
bias of 0{h1). Thus, targeting an error tolerance of e, we have = M~^ = 0(e), so that L = 
log(e“^)/(Q; log(M))-|-C>(l). Now, to optimally allocate Nq, Ni ,..., Nl, one proceeds along the lines 
outlined in the Introduction under consideration of Theorem 3.1. Notice that < O— 

Z^q=L-\-\. — 1—K, 

and Vq is smaller than Vi (in terms of the obtained upper bounds), so the upper bound in Theorem 
3.1 can be bounded by: 


V(h[ ^ h^\ 


( 12 ) 


We also assume a computational cost proportional to , for some rate C ^ 1) with 

the resampling cost considered to to be negligible for practical purposes compared to the cost of 
the calculating the importance weights (as it is the case for the inverse problems we focus upon 
later). As with standard MLMC in [12], we need to find Nq, ... ,Nl that optimize (12) given a 
fixed computational cost J^iLo Such a constrained optimization with the complicated error 

bound in (12), results in the need to solve a quartic equation as a function of V and Q. Instead, 
one can assume that the second term on the R.H.S. of (12) is negligible, solve the constrained 
optimization ignoring that term, and then check that the effect of that term for the given choice of 
is smaller than 0(e'^). Following this approach gives a constrained optimisation problem 
identical to the simple case of [12], with solution Ni cx One works as in 

Section 1, and selects: 

Nl oc ~ . 

1=0 

Then returning to (12) one can check that indeed the extra summand is smaller than O(e^) for 
the above choice of A)- Notice that: (i) /Nq = 0(e‘^h/^^‘^/K l), and the sum X]q=z+i is 
dominated by = (!l(e“‘’/^^“i); (ii) we have (hf /Ni)^/"^ oc e/ . Therefore, 



1-C/(2q) 


1=0 



0(e2gi-C/(2a)) _ 


Thus, when C, < 2q;, the overall mean squared error is still 0(e^). In the inverse problem context 
of Section 4, we will establish that (3 = 2, a = (3/2. Also, in many cases (depending on the chosen 
PDF solver) we have ( = d. 


4. Bayesian Inverse Problem 

A context will now be introduced in which the results are of interest and the assumptions 
can be satisfied. We begin with another round of notations. Introduce the Gelfand triple V := 
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H^{D) C L^{D) C H~^{D) =: V*, where the domain D will be understood. Furthermore, denote 
t’Y (■) ■)) II ■ II the inner product and norm on with superscripts to denote the corresponding inner 

product and norm on the Hilbert spaces V and V*. Denote the hnite dimensional Euclidean inner 

product and norms as (•, •), | ■ |, with the latter also representing size of a set and absolute value, 
and denote weighted norms by adding a subscript as {,-,■)a ■= with corresponding 

norms | ■ |yi or || ■ ||^ for Euclidean and spaces, respectively (for symmetric, positive dehnite 
A with Ha being the unique symmetric square root). In the following, the generic constant C 
will be used for the right-hand side of inequalities as necessary, its precise value actually changing 
between usage. 

Let D with dD G convex. For f d V*, consider the following PDE on D: 

— V • (uVp) = / , on D , (13) 

p = 0 , on dD , (14) 


where: 


K 

^X) = U{x) + '^Uk(Tk(f>k{x) . 
k=l 


(15) 


Dehne u = {uk}k=ii with Uk ~ U[—l, 1] i.i.d. This determines the prior distribution for u. Assume 
that u,<pk £ for all k and that ||0fc||oo = L In particular, assume {ak}k=i decay^ with k. The 
state space is E = !]• H I® important that the following property holds: 


inf u(a;) 

X 


> 


inf u{x) 

X 


K 

k=l 


(Jfc > M* > 0 


SO that the operator on the left-hand side of (13) is uniformly elliptic. Let p{-;u) denote the weak 
solution of (13) for parameter value u. Dehne the following the vector-valued function 

S{p) = [9i{p),--- ,9 m{p)V 1 


where pm are elements of the dual space E* for m = 1,..., M. It is assumed that the data take 
the form 

y = Q{,p)+i, ^~A^(0,F), (16) 

where A^(0, F) denotes the Gaussian random variable with mean 0 and covariance F, and T denotes 
independence. The unnormalized density then is given by: 

^(«) = ; ^{g) = \\g-y\l. 

Consider the triangulated domains approximating D, where / indexes the number of 

nodes N{1), so that we have C ■ ■ ■ C C D°^ := D, with sufficiently regular triangles. 
Consider a hnite element discretization on consisting of functions ■ In particular, 

continuous piecewise linear hat functions will be considered here, the explicit form of which will 
be given in section 5.1. Denote the corresponding space of functions of the form (p = 


^If ly —)■ oo it is important that they decay with a suitable rate in order to ensure u lives almost surely in an 
appropriate sequence-space, or equivalently u lives in the appropriate function-space. However, here we down-weight 
higher frequencies as necessary only to induce certain smoothness properties, while actually for a given value of 
u G E the resulting permeability u G E C C°°{D) C C{D) C L°°{D) C LP{D) for all p > 1. 
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by V\ and notice that C C ■ ■ ■ C C By making the further Assumption 7 of [14] that 
the weak solution p{-;u) of (13)-(14) for parameter value u is in the space W = H Hi <Z V , 
one obtains a well-dehned hnite element approximation p\-]u) of p{-;u). Thus, the sequence of 
distributions of interest in this context is: 


4 . 1 . Error Estimates 

Notice one can take the inner product of (13) with the solution p G V, and perform integration 
by parts on the right-hand side, in order to obtain {u'Vp, Vp) = (/, p). Therefore 

uMl = «*(Vp, Vp) < («Vp, Vp) = (/,p) < ||/|k*||p||y. (17) 

So the following bound holds in V, uniformly over u: 

( 18 ) 

Notice that: 

M ^ <2 M 

l^(p) -^(p')l = (X](^m,P-pT) < ||p-p'||y X] llPmllv* = C'||p-p'||y . (19) 

m=l m=l 

So the following uniform bound also holds: 

ie(p(st<))i<c'!khi. 

The uniform bound on Q provides the Lipschitz bound 

\^{Q)-^{Q')\<C\g-G'l (20) 

obtained as follows: 

|t((7)-4(e')l=l|ie-!/lf-ie'-!/lJ| 

= lieir-ie'lf+ 2(a'-e.!/)r| 

<(|6| + |6'l + 2 |y|)|r-'|| 6 - 6 '| , 

Setting G' = 0 gives the boundedness of <h. 

Considering some sequence hi indicating the maximum diameter of an individual element at 
level I, with hi ^ 0 (e.g. hi = 2“^), the following asymptotic bound holds for continuous piecewise 
linear hat functions [4]^ 

||p(-;m) -p'(-;m)||v' < Chi\\p{-;u)\\w ■ (21) 

Furthermore, Proposition 29 of [14] provides a uniform bound based on the following decomposition 
of (13): 

—Ap = + Vm ■ Vp) . 


^Higher order finite elements can yield stronger convergence rates, but will not be considered here in the interest 
of a more streamlined presentation. 


11 



Thus, we have 


sup„||p(-;m)||m/ < C"sup„||Ap(-;m)|| 

< —sup„ (ll/ll + ||M||y|b||y) 

< c \\ f \\, (22) 


where the hrst line holds by equivalence of norms, the second holds since u G by the triangle 
inequality and Cauchy-Schwarz inequality, and the last line holds by (18) and the fact ||/||y < c||/|| 
for some c. The constant C depends on u*, ||V^^||oo,C'^ and c . Note that ||u||y < ||Vu||oo < C" < 
oo by (15). Note that the bound (22) in (21) together with (18) provides a uniform bound over / 
for dehned by : u h-)■ following the same argument as (19), which means that the 

Lipschitz bound in (20) holds here over different I as well. 

Now, the following holds by (21), (22), (18), and the triangle inequality 

\\p\-]u)<Chi . (23) 

Hence, from (19) 

\g\u) - g^-\u)\ = \g(p\.;u))-gip^-\-;u))\ < Ch , (24) 

where C is independent of the realization of u. 

Proposition 4.1. For G'/_i(u) := exp{<l>(^^“^(M)) — ^{g\u))} one has the following estimates, 
uniformly in u: 

1 - 0{hi) = Cl := e-^^^ < = exp{$(6;'-') - <l>{g^)} < e^^‘ =:Ci = l + 0{hi). (25) 


Proof. In combination with (20), equation (24) gives the stated result. □ 

Proposition 4.2 (Bias). Let g G Bb{E). Then 


|E„l9(C/)|-E„„|!,(i7)]|<CAi. 


Proof. It follows from the same reasoning as in Proposition 4.1, upon observing that 




E 


riac 



□ 


4 . 2 . Verification of Assumptions 

Assumption (Al) is satished by letting 


C 


infC:, ; C 


SUpC; . 
i>l 


Notice that the asymptotic bounds of Proposition 4.1 imply that Cfi is increasing with I while Ci 
are decreasing with 1. Therefore, these will actually be minimum and maximum over a sufficiently 
large set of low indices. 
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Assumption (A2) can be shown to hold in this context, if a Gibbs sampler is constructed. Let 
9 be the uniform measure on [—1,1] and consider a probability measure n on E := 
with density w.r.t. the measure (S)^i 9{dui)\ 

exp{—$(«)} 


7r(u) = 


L exp{-$(M)} 0.^^ d{dui) 


where it is assumed that Vu G E, <h(u) G [0, <h*]. This is the setting above, for all /, following from 
equations (20) and (24). 

Let /c G N, /c < iL be given and consider a partition of [1,..., K] into k disjoint subsets ( 0 *)^^^. 
For example k = 2 and ai and 02 are the sets of (positive) odd and even numbers up to K, 
respectively. 

One can consider the Gibbs sampler to generate from tt, with kernel: 


K 


M{u,du') = 

.7=1 


u 


.Cl}^ , 


e{du[) 


i=l 


with 




u 




.ClfQ J 


One can, for example, perform rejection sampling on tt using the prior as a proposal (and accepting 
with probability exp{—$(«)}) and we would still have a theoretical acceptance probability of 


K 


exp{-<l>(M)}(^6'(dM i) > exp{—$*}. 


2=1 


Sampling from the full conditionals will have a higher-acceptance probability and thus the Gibbs 
sampler is not an unreasonable algorithm. 

Proposition 4.3. For any u,u ^ E 

M{u,du) > exp{—2<h*(A; — l)}M{u,du). 

Proof. Gonsider 


\'^av.aj-x ’ ^a.j+l.aky 






^ '.CLj ^ -^k ) 

< exp{2$*}. 


Thus, since 


and 


K 


M{u,du) = 

i=i 


9{dUi), 


2=1 

K 


M{u,du) = (n^(“aJ“ai:a,_i,«a,+i:aj) (^0{du[), 

j=l i=l 

and the hnal element in each product is identical, it follows that 

M{u,du) > exp{—2<F*(A; — l)}M{u,du'). 


as was to be proved. 


□ 


13 



5. Numerical Results 
5.1. Set-Up 

In this section a ID version of the elliptic PDE problem in (13) is considered. Let D = [0,1] 
and consider f{x) = lOOx. For the prior specification of u, set K = 2, u{x) = 0.15 = const.., 
(Ti = 0.1, 02 = 0.025, 0i(a;) = sin(7ra;) and 02(a^) = cos(27ra;). The forward problem at resolntion 
level I is solved using a finite element method with piecewise linear shape functions on a uniform 
mesh of width hi = for some starting k > 1 (so that there are at least two grid-blocks 

in the finest, / = 0, case). Thus, on the level the finite-element basis functions are 
defined as (for Xi = i ■ 2“*^^+^^) [4]: 


({l/hi)[x - {xi - hi)] if X E [xi - hi,Xi], 

^ I {l/hi)[xi-^ hi - x] if X E [xi,Xi-^ hi]. 

The functional of interest g is taken as the solution of the forward problem at the midpoint of the 
domain, that is g{u) = p{0.5;u). The observation operator is G(u) = [p(0.25;m),p(0.75;m)]''^, and 
the observational noise covariance is taken to be T = 0.25^/. 

To solve the PDE, the ansatz pfx) = Y^i=i is plugged into (13), and projected onto 

each basis element: 

-(v ■ (uV ^p\f)\{x)'^ = {f,Gj) , 

i=l 

resulting in the following linear system: 


A'(m)p* = f', 

where we introduce the matrix A\u) with entries A\-{u) = (tiVt/’i, Vt/’j), vectors with 
entries p\ and fl = (/,'^|), respectively. Omitting the index /, the matrix is sparse and tridiagonal 
with 

=-{l/h^) / u{x)dx , Aii = {l/h‘^)l u{x)dx + / u{x)dx\ , 

Jxi-i \Jxi-i J Xi / 

and zero otherwise. The elements /* are computed analogously. The system can therefore be solved 
with cost corresponding to a computational cost rate of 7 = 1 . 

To get some understanding about the numerics and validate the theory, a number of results and 
figures will be generated. First, the PDE solution is obtained for a reference value of m on a very 
fine mesh. This reference value of p is used to numerically obtain the rate (5 in upper bounds of 
the form hf for the quantities in (21), hence also in (23), over increasing /. Then, Ni are optimally 
allocated using this [3 and the 7 above using the formulae from Section 3.3. Following the error 
analysis in Section 4.1, once (3 has been decided, we have a = (3/2. Then, observing the cost/error 
trend for a range of errors e, we expect to observe the appropriate scaling between computational 
cost and mean squared error (e.g. MSE oc cost“^ for MLSMC). 

5.2. Results 

The following setting is simulated. The sequence of step-sizes is given by hi = k = 3. 

The data G(u) is simulated with a given n* ~ f/[—1,1] (i=l,2) and h = 2“^°. The observation 
variance and other algorithmic elements are as stated above. We will contrast the accuracy of 
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(N ^ 



Figure 1: An analyical calculation of \\pi{-]u) — pi-i{-;u)\\y, with u equal to the true value used 
to generate the data, for various choices of hi. 

two algorithms. The hrst is (i) MLSMC as detailed above; the second is (ii) plain SMC: the 
same sequence of distributions as MLSMC, but using equal number of particles for a given L, and 
averaging only the samples at the last level. For both MLSMC and SMC algorithms, random walk 
MCMC kernels were used (iterated 10 times) with scale parameters falling deterministically (the 
ratio of standard deviation used for target rji versus the one for target rji+i is set to (/ + 1)//). 

5.2.1. Numerical Estimation of Algorithmic Rates 

To numerically estimate the rate ft, the quantity \\pi{-]u) — p/_i(-; ■u)||y is computed over in¬ 
creasing levels 1. Figure 1 shows these computed values plotted against hi on base-2 logarithmic 
scales. A £t of a linear model gives rate /3 = 1.935, and a similar experiment gives a = 0.993. 
This is consistent with the rate (5 = 2 and a = (5/2 expected from the theoretical error analysis 
in Section 4.1 (and agrees also with other literatnre [4]). An expensive preliminary MLSMC is 
execnted to get some first results over the algorithmic variabilty. In this execntion the nnmber of 
particles are set with the recursion Ni = [2A')_|_i] and = 1000. The simulations are repeated 
100 times. The estimated variance of p^’-{gGi)/p^’-{Gi) — as a proxy of Vi, is plotted in 

Figure 2 against hi on the same scales as before. The estimate of the rate now is /3 = 5.06. In this 
case the nnmerical estimate is much stronger than the theoretical rate used here. In fact, under 
snitable regularity conditions one may theoretically obtain the rate /3 = 4 with a stronger L^{D) 
bonnd on ||p(-; u) —pi{-] u)||, which follows from an Aubin-Nitsche duality argument [11]. However, 
even this stronger estimate is still beat by the empirical estimate. Nonetheless, the objective of 
the present work is to illnstrate the theory and not to really optimize the implementation. In fact, 
similar resnlts as presented below are obtained using either rate, presnmably owing to the fact 
that /3 = 2 > (, which is already the optimal relationship of (5 and C, and hence already provides 
the optimal asymptotic behavior of MSEoccosf^. In case an optimal (5 indnces a change in the 
relationship between f5 and C, one may expect a change in asymptotic behavior of MSE vs. cost, 
which justifies such empirical rate estimation. 
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Figure 2: Variance estimates. 



5.2.2. Algorithmic Performance with Diminishing MSE 

Given the choices of a and (3 as above, the performance of the MLSMC algorithm is bench- 
marked by simulating samplers with different maximum levels L. The value of 'f]oo{g) was hrst 
estimated with the SMC algorithm targeting 7713 ( 5 ') with = 1000. This sampler was 

realized 100 times and the average of the estimator is take as the ground truth. The standard devia¬ 
tion is much smaller than the smallest bias of subsequent simulations. When updating L —)■ L -|- 1, 
the new bias is approximately a factor 2~°‘ smaller than the previous one. Therefore the two 
sources of error in (6) can be roughly balanced by setting iV/ = for I = 0,1,..., L, and 

Xo check the effectiveness of the MCMC steps employed for dispersing the 
particles within the SMC methods, we show in Figure 3 the average (over the number of particles) 
acceptance probability for each of the L iterations when the MCMC was executed (here L = 15). 
The plot indicates reasonable performance of this particular aspect of the sequential algorithm. 

The error-vs-cost plots for SMC and MLSMC are shown in Figure 4. Note that the bullets 
in the graph correspond to different choices of L (ranging from L = 0 to L = 5). Then, as 
mentioned earlier, for a given L, the single level SMC uses a hxed number of particles over the 
sequence of targets over / = 0 , 1 ,..., L, and this number is tuned to have approximately the same 
computational cost as MLSMC with the same L. The MSE data points are each estimated with 100 
realizations of the given sampler. The htted linear model of log MSE against log Cost has a gradient 
of —0.6493 and —1.029 for SMC and MLSMC respectively. This verihes numerically the expected 
asymptotic behavior MSEoccost“^ for MLSMC, determined from the theory. Furthermore, the 
hrst rate indicates that the single level SMC performs similarly to the single level vanilla MC with 
asymptotic behavior MSEoccosf^/^. The results clearly establish the potential improvements of 
MLSMC versus a standard SMC sampler. It is remarked that the MLSMC algorithm can be 
improved in many directions and this is subject to future work. 

Acknowledgements 

AJ, KL & YZ were supported by an AcRF tier 2 grant: R-155-000-143-112. AJ is affiliated with 
the Risk Management Institute and the Center for Quantitative Finance at NUS. RT, KL & AJ 


16 





Figure 3: Acceptance rates of MCMC kernel. 
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Appendix A. Technical Results 

Introduce the following notations. For 99 G Bb{E), p > 0 and rj G V^E) 

where Mp{(p){u) = Jp,ip{v)Mp{u,dv). Dehne the operator Qp+i{u,dv) = Gp{u)Mp^i{u,dv) and 
denote Qp,n{,^) = Qp+i{.' ■ ■ Qn(^)) (0 < p < n, Qn,n is the identity operator). Also set 

pf / \ _ Qp,n{<P ~ Vnj'P)) 

“ VpiQpA^)) ’ 

Dn,n is the identity operator, and dehne the following 






(A.l) 


with the convention that $ 0 (^ 7 ^! = Vo- Working similarly to the derivation of [9, Eq. (6.2)], but 

now with varying number of particles, we have that for any n > 0 

[vA - Vn]A) = [vA - ^niVnAAA) + [^n{VnA') “ Vn]A) 

VA-A) ^ VnA\Gr.-lMM) p^_,{G^_,M^{p)) 


VK 




Vn—1 (i-?n— 1 ) 


+ -Rn" H-Dn-l,n(V^)) + [vAl^ 
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Vn—l \ (F^n—l,n(V^)) 




Algorithm 


| —|mLSMc| —|sMc| —|CoSt ^ | —I Cost 



where notice that Dn- 2 ,n-iDn-i,n = Dn- 2 ,n- Thus, working iteratively we have that 



Vn]{^) = 

p=0 


Vp^{D. 


p^n 


(^)) 


n—1 




Y,R%{DM). 

p=0 


(A.2) 


Throughout this Section (T is a constant whose value may change, but does not depend on any 
time parameters of the Feynman-Kac formula, nor (iVo,..., 

Now a technical Lemma is introduced, which will contain results that are frequently used in 
the below calculations. 


Lemma Appendix A.l. Assume (Al-2). There exist C < +cx), k G (0,1) such that for any 
n>p>0,q>s>0, 1 < r < +oo and (pn, Pq G Bh{E): 


i) ||T)p,n(‘^n)||oo < ^||93n||oo- 

a) \\V^^ {Dp^n{.Pn))\\r < || (Pn || oo ■ 

ni) l|<;,(r'p,„(v>n))lk < ■ 

iv) \\Vp(D„.(ip„))Vp(D.,,(ip,))\\i < 

'«) \\Vp(D„(Vn))R'tUD.*uM)h < ‘"'‘"'"'"Jly"-"*’-"- . 
ii<:,(r'p+i.„(v„)X+.(-D.+i.»(v>,))iii < ‘"“"""Is:"*"*’'"' 


Proof. For (i). This follows from standard calculations in the analysis of Feynman-Kac formulae; 
see e.g. the proof of Proposition 2 in [16]. For (ii).This follows from [6, Lemma 7.3.3] and (i). For 
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(iii). Recall (A.l) and note that rip{Dp^n{<^n)) = 0; then on application of Cauchy-Schwarz and 
assumption (Al) one has that 

\\R^UDp,n{^^))\\, <C\\v^^{Dp,M)hr ■ llhp(Gp) -<^(Gp)||2. . 

The result follows from [6, Theorem 7.4.4] and (i). (iv) follows from Cauchy-Schwarz and (ii) • (v) 
follows from Cauchy-Schwarz, (ii) and (iii). (vi) follows from Cauchy-Schwarz and (iii). □ 

Recall equations (8) and (9), and dehne the following terms, 

Vn(9^,A^) — / ^ N) — / , ^p+l{^P,n{^)) ■ 

p=0 p=0 

Here we use a slight abuse of notation for N, representing {Nq, ..., Nn) (or (A^^o,..., Nn-i)). 

Lemma Appendix A.2. Assume (Al-2). There exist a C < -|-cx), k G (0,1) such that for any 
n > g > 0 and (pn,Tq,9 ^ l3b{E), ||g||oo = I- 


i) \E[A,{g,N,)Vn{Tn:N)V,{^,,N)]\ < _ 

ii) \E[A,{g,N,)Vn{iPn,N)n,{^„N)]\ < 

-/Vg 

iii) |E[A,(9,iv,)v,(.p„]v)7j„(43,„iv)]I < 

iv) |E[vl,( 9 ,iV,)K,(«j„iV)K„(,j„,iV)] I < 

Proof. Set as the a algebra generated by particle system up to time q. 

i) We start by noting that E {Dp,n{^n)) \ ] = 0 for any q < p < n, so that: 


E [ A,(g, N,)V„{v>„, N)V,{^„ iV) ] = E [ A,{g, N,)v;''’(D,„{p„))Vp(D,,,(v,,))] 

0<p,s<q 

As \Aq{g,Nq)\ < 1, one can use Lemma Appendix A.l (iv), to obtain the upper bound 

x—p+q—s 


E E|^^.4,(9,iV,)y''-(B,,„(,,„))l/''-(B.,,(43,))l<C||9.,.|U||9.,|U E 


0<p,s<q 


0<p,s<q 


< 


CIRr.llooIRqlloo^"-'^ 

Na 
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) Again using E {Dp,n{^n)) \ d^q ] = 0 for g < p < n, we have 


E|v 1,(9,/V,)V„(^„,/V)K,(9>„/V)|= E[^Vl,(9,/V,)l/'''(£>p,„(^„))flfA(C.,,(v,))] ■ 

0<p,5<g 

By \Aq{g,Nq)\ < 1 and Lemma Appendix A.l(v) we have the upper-bound 

5^ E|^A,(9,/V)y"»(£>,,„(^„))fl5i(-D.,,(v,))| <C|lv»IUIIv,IU 

^ C'||v5„||oo|Rg||oo«:"'“^ 

^ o /o 
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iii) By \Aq{g,Nq)\ < 1 and Lemma Appendix A.l(v), we have the upper bound 

n—1 q 

p=0 s=0 

n—1 q 

< C'||<dn||oo||<^q||oo EE - y/WqNl 

p=0 s=0 


iv) Working as in (iii), by \Ag{g, Ng)\ < 1 and Lemma Appendix A.l(vi), we have 


n—1 q—1 

E[Ag{g,Ng)ng{^g,N)nn{<fn,N)] | E ^ ^9, ^q) Rp+i{D n)) R^+l^D s ,g g)) 

p=0 s=0 

n—1 q 

P=0 s=0 


□ 


Lemma Appendix A.3. Assume (Al-2). There exists a C < +cx), such that for any n > 0, 
1 < r < +CX) and (p G Bb{E): 

Proof. The result is standard, but we give the proof for completeness. We have 


\\An{p,Nn)\\r = 


_ II (ipGri)-VnirG„) 


V>iG„) 


_|_ VnirGn) 
V^"{G„)pr.{Gn) 




Application of Minkowski, (Al) and [6, Theorem 7.4.4] complete the proof. 


□ 


Proposition Appendix A.l. Assume (Al-2). There exist aC < +cx), k G (0,1) such that for 
anyn>q>0 and pn,Pq,f,g e Bb{E), ||5(||oo = ||/||oo = L' 


E [An{g,Nn)Ag{f,Ng){g((- -gn){Tn){Vq'’ -Vq){Tq)] < (^ || || oo || || oo 

Proof. From the dehnition of An, An we have that 

E [An{g,Nn)Ag{f,Nq){g((^ - Vn)iPn)iVg‘‘ - Vq)iTq)] = 


^l,q,n{f, 9, Tq, Pm Ng, Nn) + ^2,ij,n(/; Pq, Pn, Nq, Nn) 


where we have dehned 


^l,q,nif:9:Pq:PmNg,Nn) = E [ An{g, Nn) Ag{f, Ng){g((^ “ hn) (<^n) (hf" - Vq){Pq)] : 
A2,g,n{f,Pq,PmNg,Nn) = ¥.[Ag{f, Ng){g((^ - gn){Pn){'nq’' - riq){Pq)] ■ 

By Lemma Appendix A.2 and the fact that < 1, we have that 


VnjgGP 

rniGp) 


^2,q,ni^f, Pqi Pn, Thq, Tin) E d/1| | oo || || oo (' 


Na 


+ 


n; 


3/2 


+ 


+ 


1 


N„N, 


)■ 
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Thus we concentrate on Ai^g^„(/, g, ipn, Nq, Nn). We have via (A.2): 

j 9t 9^ni ^qi ^qi A^n) 

= E [An{g, Nn)Aq{f, NqAVni^n, N) + iV))(V,(<^„ N) + TV)) ] . 

We will deal with each of the 4 terms on the R.H.S. separately. 

We start with E [An{g, Nn)Aq(f, Nq)Vni^m N)Vq{(pq, N) ] and work as follows, 


E [ AAg, NAAqif, Nq)Vni^n: N)Vqi^q, N) ] 


n q 


< 


P=0 s=0 ^ ^ 

a q—1 

.V._ n _ n V P ^ 


p=0 5=0 


n q 




p=0 s=0 


^n — p-\-q — s 

^yNpNs 


< 


C||yn||oo||yi;||c 


where for the third line we have used \Aq{g,Nq)\ < 1 and two applications of Holder’s inequality; 
for the forth line we have used Lemma Appendix A.l(ii) and Lemma Appendix A.3. 

Using very similar calculations one can obtain the npper bounds, 


E [ AAg, NAAqif, Nq)Vn{^n, iV)7^,(<^„ N) ] 
E [AAg, NAAqif, Nq)nn{^n, N)Vq{<fq, N) ] 
E [ An{g, N)Aq{f, N)nn{(Pn)Tlq{(Pq) ] 
The proof is now complete. 


< 

C^||¥^n||oo||9^q||oo 

Nr,Nq 

< 

C^llv^nllooll^^qlloo 


< 

C^llv^nllooll^^qlloo 



□ 
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